A simplified convergence theory for Byzantine resilient stochastic gradient descent
نویسندگان
چکیده
In distributed learning, a central server trains model according to updates provided by nodes holding local data samples. the presence of one or more malicious servers sending incorrect information (a Byzantine adversary), standard algorithms for training such as stochastic gradient descent (SGD) fail converge. this paper, we present simplified convergence theory generic Resilient SGD method originally proposed Blanchard et al. (2017) [3]. Compared existing analysis, shown stationary point in expectation under assumptions on (possibly nonconvex) objective function and flexible gradients.
منابع مشابه
Byzantine Stochastic Gradient Descent
This paper studies the problem of distributed stochastic optimization in an adversarial setting where, out of the m machines which allegedly compute stochastic gradients every iteration, an α-fraction are Byzantine, and can behave arbitrarily and adversarially. Our main result is a variant of stochastic gradient descent (SGD) which finds ε-approximate minimizers of convex functions in T = Õ ( 1...
متن کاملConvergence of Stochastic Gradient Descent for PCA
We consider the problem of principal component analysis (PCA) in a streaming stochastic setting, where our goal is to find a direction of approximate maximal variance, based on a stream of i.i.d. data points in R. A simple and computationally cheap algorithm for this is stochastic gradient descent (SGD), which incrementally updates its estimate based on each new data point. However, due to the ...
متن کاملConvergence Analysis of Gradient Descent Stochastic Algorithms
This paper proves convergence of a sample-path based stochastic gradient-descent algorithm for optimizing expected-value performance measures in discrete event systems. The algorithm uses increasing precision at successive iterations, and it moves against the direction of a generalized gradient of the computed sample performance function. Two convergence results are established: one, for the ca...
متن کاملQuantized Stochastic Gradient Descent: Communication versus Convergence
Parallel implementations of stochastic gradient descent (SGD) have received signif1 icant research attention, thanks to excellent scalability properties of this algorithm, 2 and to its efficiency in the context of training deep neural networks. A fundamental 3 barrier for parallelizing large-scale SGD is the fact that the cost of communicat4 ing the gradient updates between nodes can be very la...
متن کاملConvergence diagnostics for stochastic gradient descent with constant step size
Iterative procedures in stochastic optimization are typically comprised of a transient phase and a stationary phase. During the transient phase the procedure converges towards a region of interest, and during the stationary phase the procedure oscillates in a convergence region, commonly around a single point. In this paper, we develop a statistical diagnostic test to detect such phase transiti...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: EURO journal on computational optimization
سال: 2022
ISSN: ['2192-4406', '2192-4414']
DOI: https://doi.org/10.1016/j.ejco.2022.100038